首页> 外文OA文献 >Violent scene detection using a super descriptor tensor decomposition
【2h】

Violent scene detection using a super descriptor tensor decomposition

机译:使用超级描述符张量分解的暴力场景检测

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

This article presents a new method for violent scene detection using super descriptor tensor decomposition. Multi-modal local features comprising auditory and visual features are extracted from Mel-frequency cepstral coefficients (including first and second order derivatives) and refined dense trajectories. There is usually a large number of dense trajectories extracted from a video sequence; some of these trajectories are unnecessary and can affect the accuracy. We propose to refine the dense trajectories by selecting only discriminative trajectories in the region of interest. Visual descriptors consisting of oriented gradient and motion boundary histograms are computed along the refined dense trajectories. In traditional bag-of-visual-words techniques, the feature descriptors are concatenated to form a single large feature vector for classification. This destroys the spatio-Temporal interactions among features extracted from multi-modal data. To address this problem, a super descriptor tensor decomposition is proposed. The extracted feature descriptors are first encoded using super descriptor vector method. Then the encoded features are arranged as tensors so as to retain the spatio-Temporal structure of the features. To obtain a compact set of features for classification, the TUCKER-3 decomposition is applied to the super descriptor tensors, followed by feature selection using Fisher feature ranking. The obtained features are fed to a support vector machine classifier. Experimental evaluation is performed on violence detection benchmark dataset, MediaEval VSD2014. The proposed method outperforms most of the state-of-The-Art methods, achieving MAP2014 scores of 60.2% and 67.8% on two subsets of the dataset.
机译:本文提出了一种使用超描述符张量分解的暴力场景检测新方法。从梅尔频率倒谱系数(包括一阶和二阶导数)和精细的密集轨迹中提取包括听觉和视觉特征在内的多峰局部特征。通常从视频序列中提取大量密集的轨迹。这些轨迹中的一些是不必要的,并且会影响精度。我们建议通过只选择感兴趣区域中的判别轨迹来细化密集轨迹。沿精炼的密集轨迹计算由定向梯度和运动边界直方图组成的视觉描述符。在传统的视觉词袋技术中,将特征描述符连接起来以形成单个大特征向量以进行分类。这破坏了从多模式数据中提取的要素之间的时空相互作用。为了解决这个问题,提出了超描述符张量分解。首先使用超级描述符矢量方法对提取的特征描述符进行编码。然后将编码后的特征安排为张量,以保留特征的时空结构。为了获得用于分类的紧凑特征集,将TUCKER-3分解应用于超级描述符张量,然后使用Fisher特征排名进行特征选择。获得的特征被馈送到支持向量机分类器。在暴力检测基准数据集MediaEval VSD2014上进行了实验评估。拟议的方法优于大多数最新方法,在数据集的两个子集上获得的MAP2014分数分别为60.2%和67.8%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号